Random sampling of lattice paths with constraints, via 

transportation 
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Abstract 

We investigate Monte Carlo Markov Chain (MCMC) procedures for the random sam- 
phng of some one-dimensional lattice paths with constraints, for various constraints. We 
will see that an approach inspired by optimal transport allows us to efficiently bound the 
mixing time of the associated Markov chain. The algorithm is robust and easy to imple- 
ment, and samples an "almost" uniform path of length n in n'^'^^ steps. This bound makes 
use of a certain contraction property of the Markov chain, and is also used to derive a bound 
for the running time of Propp- Wilson's Coupling From The Past algorithm. 

1 Lattice Paths with Constraints 




We are interested in this paper in some families of one-dimensional lattices paths. We fix three 
integers n, a, 6 > 0, and consider the paths of length n, with steps +a/ — b, that is, the words of 
n letters taken in the alphabet {a, —b}. Such a word s — (si, S2, . . . , s„) is identified to the path 
S = {Si, . . . , Sn) ■— {si, si + S2, ■ ■ ■ , si + S2 + ■ ■ ■ + s,i).On the right, one sees the lattice path 
S — (1,2, 0, 1, 2, 3, 1) associated to the word s = (1, 1, —2, 1, 1, 1, —2). The problem we discuss 
here is to efficiently sample uniform (or almost uniform) paths in a sub-family An of paths, with 
Markov chains. 

To illustrate the methods and the results, we focus on three particular sub-families. 

1. Discrete meanders, denoted by which are simply the non-negative paths: S G if 
for any i < nwe have Si > 0. This example is mainly illustrative because the combinatorial 
properties of meanders make it possible to perform exact sampling very efficiently (an 
algorithm running in 0(n^'^'^) steps is given in [5], an order that we cannot get in the 
present paper). 

2. Paths with walls. A path with a wall of height h between r and s is a path such that Si > h 
for any r < i < s (see Fig. [T]for an example). These are denoted by Wn = VV„(/i,r, s), 
they appear in statistical mechanics as toy models for the analysis of random interfaces 
and polymers (see examples in [5]). 
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3. Culminating paths, denoted by C„, which are non- negative paths whose maximum is at- 
tained at the last step: for any i we have < Si < Sn- They have been introduced in [3], 
motivated in particular by the analysis of some algorithms in bioinformatics. 




Figure 1: A path of steps +1/ — 2, with a wall of height h = 6 between i = 10 and j = 15. 



Remark 1. The methods discussed here apply to any values of (a, 6), but we have in mind the 
challenging case b > a: for our three families the ratio card(y^„)/card(7'„) decreases exponen- 
tially fast, making impossible a naive rejection algorithm. 



2 Sampling with Markov chains 

We will consider Markov chains in a family An, where all the probability transitions are sym- 
metric. For a modern introduction to Markov chains, we refer to [T . Hence we are given a 
transition matrix (pij) of size \An\ x \An\ with 

= Pj,i whenever i ^ j. 

Lemma 2. // such a Markov chain is irreducible, then it admits as unique stationnary distri- 
bution the uniform distribution tt ~ TT{An) on An- 

Proof. The equality TT(i)pij = 7r(j)pj^i holds for any two vertices i,j. This shows that the 
probability distribution tt is reversible for (pij), and hence stationnary. It is unique if the chain 
is irreducible. □ 

This lemma already provides us with a scheme for sampling an almost uniform path in An, 
without knowing much about An- To do so, we define a "fiip" operator on paths. Fix an integer 
i € {1,2, . . . ,n — 1} and a path S — {Si, . . . , Sn) ; let (si, . . . , s„) be the corresponding word. 

The path (j){S,i,X) is defined as follows : if (si,Sj+i) = (—6. a) = V then these two steps 

A 

are exchanged into (a, —b) = \. The n — 2 other steps remain unchanged. For the case i — n, 
0(S,ri,t) is simply the path associated to the word 

(si, . . . , s„_i,a). 
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The path (/"(S, i, J,) is defined equally: if (s^, s^+i) = *, it turns into V . The path (t>(S, n, 
is the path associated to (si, . . . , Sn-i, —b). 

For the family An — Cn, we have to use another definition of n, t) and (t>{S, n, J,,), if we 
want the chain to be irreducible. Notice that since the maximum is reached at n, the \b/a \ + 1 
last steps are necessarily 

(a, a, ... , a) or (— fe, a, . . . , a). 

We thus define (j){S,n,^) as the path obtained by changing the \b/a] + 1 last steps into 
(a, a, ... , a) (regardless of their initial values in S) and 0(8, n, ].) as the path obtained by chang- 
ing the {b/a] + 1 last steps into (—6, a, . . . ,a) 

We are also given a probability distribution p = (pi)i<i<„, and we assume that Pi > for 
each i. We will consider a particular sequence p later on, but at this point we can take the 
uniform distribution in {1, . . . , n}. We can now describe the algorithm. 

Algorithm 1 MCMC: Approximate sampling of a path in An 
initialize S G An 

111 I2, ■ ' ' ^ i.i.d. r.v. with law p 

El, £2, • • ■ <— i.i.d. uniform r.v. in {f, J,} 

for t = 1 to T do 

if (/)(S,/(,£t) is in An then 

S^0(S, 

end if 
end for 



In other words, this algorithm performs the Markov chain in An with transition matrix 
P = (Pr,s)r seA defined as follows: 

^R,s — Pi/2, if S = 0(R, i, e) for some e and otherwise, 
Pr,r = 1 - I]s#R^R,s- 

Proposition 3. Denote by S{t) the path sampled by the t-th run of the loop in Algorithm]^ 

When t — > 00, the sequence S{t) converges in law to the uniform distribution in An- 

Proof. We have to check that the chain is aperiodic and irreducible. Aperiodicity comes from 
the (many) loops. Irreducibility will follow from Lemma [5] □ 

We choose now the distribution (pi). Instead of ~ we will use the weights defined by 
(see the plot of i 1-^ for n = 100 below): 



where kq — 
distribution. 
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The reason for which we use this particular distribution will appear in the proof of Proposition 
[51 We will then need the following relation: for each 1 < i < n — 1, 

2pi -Pt-i -Pi+i = Ko- (2) 

Remark 4. There are obviously many other Markov chains which are reversible with respect to 
the uniform measure, and some of them may seem more natural to the reader. However, such 
Markov chains are in general neither monotonous (see later Section^ nor of positive Ricci 
curvature (Section\^. The latter condition is essential for our purpose. 

2.1 Analysis of Algorithm [1] 

We could deduce from a brief glance at Algorithm [1] that time-complexity is always linear in T, 
but we have to pay attention to what is hidden behind each run of the for loop. 

• If /t < n, the time needed for the test "(/i(S, It,£t) is in An" can be considered as constant, 
since we only have to compare 0, S{i), S{n). 

• If It — n, the new value S{n) is compared with the maximum of S, which can be done in 
0(n). Fortunately, this occurs with probability pn — 0{n^^), so that the time-complexity 
of each loop is, on average, a 0(1). 

For Algorithm [1] to be efhcient, we need to know how S(T) is close in law to tt. This question 
is related to the spectral properties of the matrix P. In particular, the speed of convergence is 
governed by the spectral gap {i.e. 1 — A, where A is the largest of the modulus of the eigenvalues 
different from one, see [9] for example), but this quantity is not known in general. Some 
geometrical methods [F allow to bound from below 1 — A, but they assume a precise knowledge 
of the structure of the graph defined by the chain P. It seems that such results do not apply 
here. 

Instead, we will study the metric properties of the chain P with respect to a natural distance 
on Am and show that it satisfies a certain contraction property. 

3 Error estimates with contraction 

Going back to a more general setting, we consider a Markov chain in a finite set V , endowed 
with a metric d. For a vertice x ^ V and a transition matrix P, we denote by PS^ (resp. P^Sx) 
the law of the Markov chain associated to P at time 1 (resp. i), when starting from x. For 
x,y € V, the main assumption made on P is that there is a coupling between PSj., PSy (that is, 

a random variable {Xi,Yi) with Xi PSx, Yi PSy) such that 

E[diXi,Y,)]<il~K)d{x,y), (3) 

for some k > 0, which is called the Ricci curvature of the chain, by analogy with the Ricci 
curvature in differential geometr}0. If the inequality holds, then it implies that P admits a 
unique stationnary measure tt and that, for any x, 

II P*4-7r ||tv< (1-K)*diam(y), (4) 

^The Ricci curvature is actually the largest positive number such that (|3]l holds, for all the couplings of 
PSx, PSy ; here we should rather say that Ricci curvature is larger than k. 
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where diam(V^) is the diameter of the graph with vertices V induced by the Markov chain and 
II ■ IItv stands, as usual, for the Total Variation distance 

II Ati - ||tv:= sup Imi(^) - M2(^)| ■ 

ACV 

Hence, a positive Ricci curvature gives the exponential convergence to the stationnary measure, 
with an exact (again, exact means non-asymptotic) bound. In many situations, a smart choice 
for the coupling between Xi,X2 gives a sharp rate of convergence in eq. (j4]) (see [ID]). 

3.1 Metric properties of P 

To apply the Ricci curvature machinery, we endow each An with the L^-distance 

cii(5,5') = ^Ei^.-5,'|. 

i=0 

(Notice that \Si — Sl\ is always a multiple of a + b.) For our purpose, it is fundamental that this 
metric space is geodesic. 

Lemma 5 (Families An are geodesic). Each An, equipped with the distance di, is geodesic 
in the following sense: for any two S,T G An with di{S,T) = k, there exist fc + 1 paths 
So = S, Si, . . . , Sk — T in An such that for each i 

• di {Si, Si+i) = 1 ; 

• Si and Si^i are neighbours in the Markov chain P. 

This implies in particular that P is irreducible and that the diameter of P is smaller than 
ma,xs,Tdi{S,T) < n{n + l)/2. 

Proof. The proof goes by induction on k. We fix ^ T (and denote by s, t the corresponding 
words) ; we want to decrease di{S,T) by one, by applying the operator (f)(-,i,e) with proper i,s. 
We denote by io S {1, . . . , n} the first index for which S ^ T: 

Sq — Tq, Si — Ti, . . . , Sig-i = 5*^^ ^ Ti^. 

For instance we have Tig = Sig + a + b. Let j be the position of the left-most peak in T in 
{io -|- 1, io -I- 2, . . . , n}, if such a peak exists. Then S' :— 4>{T, j, -I) is also in An- it is immediate 
for the three families Mn, Wn,Cn. We have di{S, S') = k — 1. 

If there is no peak in T after io, then {tig+l,tig-^.2, ■ • ■ , tn) = (a, a, . . . , a). Hence we try to 
increase the final steps of S by one. To do so, we choose S' :— 0(S,n,t) ii S ^ (/)(S,n,t), or 
S' = 0(S, j, t) where j is the position of the right-most valley otherwise. 

□ 

We will show that Ricci curvature of P w.r.t. this distance is (at least) of order 

Proposition 6. For the three families A4n, C„, W„, the Ricci curvature of the associated Markov 
chain, with weights defined as in is larger than kq. 

Proof. Fix S,T in An S {A4„,C„, W„}, we first assume that S,T are neighbours, for instance 
T = (/)(S,?,t) for some i, as in Fig. [2l Let (S^,S^) be the random variable in An x An whose 
law is defined by 

{S\S^) ^^""^^ {<j>{S,I,E),<f>{T,I,E)), 
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i — 1 i i + 1 




Figure 2: The paths S, T differ at i. 



where X is a r.v. {1, . . . ,n} with distribution p and E is uniform in {'[, In other words, we 
run one loop of Algorithm [T] simuhaneously on the both paths. 

We want to show that S^, are, on average, closer than S, T. Three cases may occur: 
Case 1. I = i This occurs with probability pi and, no matter the value of E, we have = S^. 
Case 2. 2 — i — 1 or ^ + 1. We consider the case i ~ 1. Since S and T coincide everywhere 
but in i, we necessarily have one of these two cases: 

• there is a peak in S at « — 1 and neither peak nor valley in T at i — 1 (as in the figure on 
the right) ; 

• there is a valley in T at i — 1 and neither peak nor valley in S at i — 1. 

In the first case for instance, then we may have (ii(S^,S^) = 2 \i E =4, while the distance 
remains unchanged if E The case I — i + 1 is identical. This shows that with a probability 
smaller than pj_i/2 + we have di(S^, S^) = 2. 

Case 3. I 7^ i — 1, i, i + 1 In this case, S and T are possibly modified in I, but if there is 
a modification it occurs in both paths. It is immediate for the families A^„,Wn, less apparent 
for Cn- In the latter we have to check that if (/)(T,I, '[") is in C„, so is (/)(S,X, f). But this 
is true because we have maxj S'j = S'„ = T„. Hence a flip in S at I does not violate the 
maximum- at-last- position condition, because it does not violate this condition for T. 

Thus, we have proven that when S, T only differ at i 

E[di(Si,S2)] <2x (p,_i/2 + p,+i/2) + 0xp, + lx (1 - - p,_i/2 - k+i/2) (5) 
< (1-Ko) X 1 = (l-Ko)rfi(5,T). 

What makes Ricci curvature very useful is that if this inequality holds for pairs of neighbours 
then it holds for any pair, as noticed in [4]. Indeed, take k paths Sq = S, Si, . . . , Sk = T as 'm 
Lemma [5] and apply the triangular inequality for c?i : 

E [rfi (<^(5, X, E), <I>{T, I, i?))] < ^ E [di (0(5, ,I,E), 0(5,+i , I, E))] 

i=0 

<(l-Ko)fc-(l-«o)di(5,r). 

□ 

Remark 7. It is easy to exhibit some S, T such that ineq. ([5]) is in fact an equality. In the case 
where pi = 1/n, this equality reads E [di(S"'^, S^)] = di{S,T), and we cannot obtain a positive 
Ricci curvature ( though this does not prove that there is not another coupling or another distance 
for which we could get a k > in the case pi = 1/n.). 
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Combining Proposition |6] with Eq. (j4]) gives our main result. 

Theorem 8. For each family An, Algorithm{I\ returns an almost uniform sample ofir, as soon 
as T > n^: 

II S{T) - TT ||tv< nV2exp (- ) . 

\ n{n + l){n + 2j J 

3.2 Related works 

Bounding mixing times via a contraction property over the transportation metric is quite a 
standard technique, the main ideas dating back to Dobrushin (1950's). A modern introduction 
is made in [9]. For geodesic spaces, this technique has been developped in under the name 
path coupling. 

The Markov chain P on lattice paths has in fact already been introducec^l by D.Wilson [l2] 
for lattice paths with a fixed end-point (as a first step for the sampling of random tilings), with 
uniform weights pi = 1/n. The author also proves a mixing time of order logn, by showing 
that ([3]) holds with a different distance (namely, a kind of Fourier transform of the heights of 
the paths). It does not seem to us that his method can be applied for paths with our kinds of 
constraints (when the end-point is not fixed). 

More generally, we do believe that it is difficult to build a Markov chain for these kinds of 
lattice paths which has a mixing time much smaller than n'^ , with the constraint that each step 
of the chain is fast to compute (in addition to [12], see also [2] for some related results in the 
context of quasicrystals: the weights are also uniform with yet another distance). 

4 Coupling From The Past with P 

Propp- Wilson's Coupling From The Past (CFTP) [11] is a very general procedure for the exact 
sampling of the the stationnary distribution of a Markov chain. It is efficient if the chain 
is monotonous with respect to a certain order relation ^ on the set V of vertices, with two 
extremal points denoted 0, 1 {i.e. such that ^ x ^ 1 for any vertice x). This is the case here 
for our three families, with the partial order 

S ^ T iff 5, < for any i. 

For the family A4iq with a = l,b = —2 for instance, we have 

= Veanders = (1' 1' "2, 1, 1, -2, 1, 1, -2, 1), 

1 = ^meanders ^ ^' ^' 

It is easy to check that for each n, families C„ and Wn also admit extremal points 0, 1. 
We describe CFTP, with our notations, in Algorithm [2l 

We refer to ([5, Chap. 10) for a very clear introduction to CFTP, and we only outline here 
the reasons why this indeed gives an exact sampling of the stationnary distribution. 

• The output of the algorithm (if it ever ends!) is the state of the chain P which has been 
running "since time — cxd", and thus has reached stationnarity. 

• The exit condition S = T ensures that it is not worth running the chain from T steps 
earlier, since the trajectory of any lattice path ^ R 1 is "sandwiched" between those 
of 0, 1, and therefore ends at the same value. 
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Algorithm 2 CFTP: Exact sampling of a path in An 



i.i.d. r.v. with law p 
i.i.d. uniform r.v. in {t, J,} 



S ^ 0, T ^ 1 

. ..,1-2,1-1 ^ 
. . . , £-2,£-l ^ 

T = 1 

repeat 

S ^ 0, T ^ i 

for t = — r to do 

if (t){S,It,£t) is in An then S <r- 
if 0(T,/t,et) is in X then 

end for 

r ^ 2r 
until S = T 



iS,It,£t) 



-T/2 



^ output 



Figure 3: A sketchy representation of CFTP : trajectories starting from 0, 1 at time —T/2 don't 
meet before time zero, while those starting at time — T do. 



Proposition 9. Algorithmic ends with probability 1 and returns a exact sample of the uniform 
distribution over An. This takes on average 0{n^{logn)^) time units. 

We recall that CFTP has a major drawback compared to MCMC. For the algorithm to be 
correct, we have to reuse the same random variables It,£t, so that space-complexity is in fact 
linear in n'^(logn)^. This may become an issue when n is large. 

Proof. It is shown in [11) that Algorithm returns an exact sampling in C(ijj^jx logi?) runs of 
the chain, with 

*mix i * > ; sup II P*(5i, - TT ||tv< e" 

The number H is the length of the longest ordered chain of states between and 1. It is a 
consequence of the proof of Lemma [5] that H — 0{n^). Regarding t^^-^, we have, for instance 
by m, P.189), 

^mix ^ — (log(diamy) + 1) , 

hence fj^ix ~ O(n^logn). (Recall that under Section \TJ\ each test in Algorithm [5] takes, on 
average, C(l) time units.) □ 



■^It is considered in |12) 2d-paths from (0,0) to {x,y) with steps East/North. These are, up to a linear 
transformation, one-dimensional paths of length x + y with steps +x/ — y, starting and ending at zero. 
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5 Concluding remarks and simulations 



1. In FiglH we show simulations of the three kinds of paths, for n — 600. The final height of 
the culminating path is very low (about 30), it would be interesting to use our algorithms to 
investigate the behaviour of this height when n — > oo ; this question was left open in [3]. 




Figure 4: (Almost) uniform paths of length 600. From top to bottom: a culminating path, a 
meander, a path with wall (shown by an arch). 



2. One may wonder to what extent this work applies to other families An of paths. The main 
assumption is that the family of paths should be a geodesic space w.r.t. distance di. This is 
true for example if the following condition on An is fulfilled: 

{R, T e Ai and i? ^ 5* ^ T) ^ e An- 

Notice however that this is quite a strong requirement, and it is not verified for culminating 
paths for instance. 

3. A motivation to sample random paths is to make and test guesses for some functionals of these 
paths, taken on average over An- Consider a function / : An — >■ K, we want an approximate 
value of 7r(/) := card(^„)~^ ^seA fi^)^ exact value is out of reach by calculation. We 
estimate this quantity by 

'^(/):-^E/(S(0), (6) 

(recall that S{t) is the value of the chain at time t). For Algorithm [T] to be efficient in practice, 
we have to bound 

P(|7r(/)-7r(/)| >r), (7) 
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for any fixed r > 0, by a non-asymptotic (in T) quantity. This can be done with (0, Th.4-5), 
in which one can find concentration inequafities for ([7]). The sharpness of these inequahties 
depends on k, and on the geometrical structure of An. 

Aknowledgements. This work was done while I was enjoying a post-doctoral position granted 
by Anr Gamma ; many thanks to Frederique Bassino and the other members of the Anr for 
the support. I also would like to thank Elie Ruderman for the English corrections. 
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QQ , We investigate Monte Carlo Markov Chain (MCMC) procedures for the random sam- 

phng of some one-dimensional lattice paths with constraints, for various constraints. We 
will see that an approach inspired by optimal transport allows us to efficiently bound the 
mixing time of the associated Markov chain. The algorithm is robust and easy to imple- 
Qi^ ' ment, and samples an "almost" uniform path of length n in n'^'^^ steps. This bound makes 

> 



(-H I use of a certain contraction property of the Markov chain, and is also used to derive a bound 

for the running time of Propp- Wilson's Coupling From The Past algorithm. 



1 Lattice Paths with Constraints 

, Lattice paths arise in several areas in probability and combinatorics, either in their own interest 

^■f-^ ' (as realizations of random walks, or because of their interesting combinatorial properties: see [T] 

OO . for the latter) or because of fruitful bijections with various families of trees, tilings, words. The 

problem we discuss here is to efficiently sample uniform (or almost uniform) paths in a family 
of paths with constraints. 

CS| ' There are several reasons for which one may want to generate uniform samples of lattice 

, paths: to make and try conjectures on the behaviour of a large "typical" path, test algorithms 

running on paths (or words, trees,...). In view of random sampling, it is often very efRcient to 
make use of the combinatorial structure of the family of paths under study. In some cases, this 
yields linear-time (in the length of the path) ad-hoc algorithms [HPJ. However, the nature of 
the constraints makes sometimes impossible such an approach, and there is a need for robust 
}J] ' algorithms that work in lack of combinatorial knowledge. 

5^ \ Luby,Randall and Sinclair 11! design a Markov chain that generate sets of non-intersecting 

lattice paths. This was motivated by a classical (and simple, see illustrations in [4l[T4|) corre- 
spondence between dimer configurations on an hexagon, rhombae tilings of this hexagon and 
families of non-intersecting lattice paths. As the first step for the analysis of this chain, Wilson 
[13] introduces a peak/ valley Markov chain (see details below) over some simple lattice paths 
and obtain sharp bounds for its mixing time. We present in this paper a variant of this Markov 
chain, which is valid for various constraints and whose analysis is simple. It generates an "al- 
most" uniform path of length n in n^^^ steps, this bound makes use of a certain contraction 
property of the chain. 

Appart from the algorithmic aspect, the peak/valley process seems to have a physical rel- 
evancy as a simplified model for the evolution of quasicrystals (see a discussion on a related 
process in the introduction of [?). In particular, the mixing time of this Markov seems to have 
some importance. 
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Notations 




Figure 1: The lattice path S = (1, 2, 0, 1, 2, 3, 1) associated with the word (1, 1, -2, 1, 1, 1, -2). 

We fix three integers n,a,b > 0, and consider the paths of length n, with steps +a/ — b, 
that is, the words of n letters taken in the alphabet {a, —b}. Such a word s = (si, S2, . . . , Sn) is 
identified to the path S = (Si, . . . , Sn) := (si, si + S2, . . . , si + .S2 + • • ■ + Sn)- 

To illustrate the methods and the results, we focus on some particular sub-families An C 
{a, -6}": 

1. Discrete meanders, denoted by which are simply the non-negative paths: S G A^„ if 
for any i < nwe have Si > 0. This example is mainly illustrative because the combinatorial 
properties of meanders make it possible to perform exact sampling very efficiently (an 
algorithm running in 0(n^^^) steps is given in [2], an order that we cannot get in the 
present paper). 

2. Paths with walls. A path with a wall of height h between r and s is a path such that Si > h 
for any r < i < s (see Fig. [5] for an example). These are denoted by >V„ = W„(/i, r, s), 
they appear in statistical mechanics as toy models for the analysis of random interfaces 
and polymers (see examples in [7]). 

3. Excursions, denoted by £„, which are non- negative paths such that Sn — 0. In the case 
a = b = 1, these correspond to well-parenthesed words and are usually called Dyck words. 
In the general case, Duchon [6] proposes a rejection algorithm which generates excursions 
in linear time. 

4. Culminating paths of size n, denoted further by C„, which are non- negative paths whose 
maximum is attained at the last step: for any i we have < Si < Sn- They have 
been introduced in [2 , motivated in particular by the analysis of some algorithms in 
bioinformatics. 
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Figure 2: A path of steps -f 1/ — 2, with a wall of height h = 6 between z = 10 and j = 15. 
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2 Sampling with Markov chains 



We will consider Markov chains in a family An, where all the probability transitions are sym- 
metric. For a modern introduction to Markov chains, we refer to |8j. Hence we are given a 
transition matrix (pij) of size |^„| x |^„| with 

Pi.j — pj,i whenever i ^ j, 
Pi,t = 1 - ^Pi,j- 

Lemma 1. If such a Markov chain is irreducible, then it admits as unique stationary distribution 
the uniform distribution tt — 7r(^„) on An- 

Proof. The equality TT{i)pi_j — TT{j)pj^i holds for any two vertices z, j. This shows that the 
probability distribution tt is reversible for (pij), and hence stationary. It is unique if the chain 
is irreducible. □ 

This lemma already provides us with a scheme for sampling an almost uniform path in An, 
without knowing much about An- To do so, we define a "flip" operator on paths, this is an 
operator 

0: A> X X {|,t} X {+,-} ^ An 

{S,i,e,S) ^ 0(8, i, e, (5). 

When i G {1, 2, . . . , n — 1} the path 0(S, i, t, S) is defined as follows : if (si, Si+i) = (—6, a) — 



then these two steps are changed into (a, —6) — \. The n — 2 other steps remain 
unchanged. If {si, Si+i) (—6, a) then 4>{S, i, t)(5 — S. Note that in the case i G {1, 2, . . . , n — 1} 
the value of does not depend on S. 

For the case i = n, ii S = +, we define (j>{S, n, £)d as before as if there would be a +a as the 
end if the path. For instance, in the case where Sn = —b, the path 0(S,n,t)+, the n-th step is 
turned into a. 

A \/- 

The path 0(8, i,^,)^ is defined equally: if i < n and (si,Si+i) = *, it turns into V . 
When S = —, one flips as if there would be a —6 at the end of the path. 

For culminating paths, we have to take another definition of 0(S, n, ^)6, (f>(S, n, ],)d, see Sec- 
tion O 

We are also given a probability distribution p = {pi)i<i<n, and we assume that pi > for 
each i. We will consider a particular sequence p later on, but at this point we can take the 
uniform distribution in {1, . . . , n}. We describe the algorithm below in Algorithm [T] 



Algorithm 1 Approximate sampling of a path in A., 
initialize S — {+a, +a, +a, . . . , +a) 
Ii, l2, ■ ■ ■ i.i.d. r.v. with law p 
£i,e2, • • ■ <— i.i.d. uniform r.v. in {t, i} 
61,62, ■■ ■ i.i.d. uniform r.v. in {+, — } 
for t = 1 to T do 

if 0(8, /f , £t)(5f is in An then 
S^(l){S,It,et)6t 

end if 
end for 
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In words, this algorithm performs the Markov chain in An with transition matrix P = 
(-Pr,s)r seA defined as foUows: 

Pr,s — Pif^, if S 7^ R and S = 0(R, i, e)S for some i, e, S 
Pr,s = otherwise, 

Proposition 2. Denote by S{t) the random path obtained after the t-th run of the loop in 
Algorithm]^ When t — > oo, the sequence S{t) converges in law to the uniform distribution in 
An- Moreover, the execution of Algorithm{l\ until time T is linear in T. 

Proof. For the first claim, we have to check that the chain is aperiodic and irreducible. Aperi- 
odicity comes from the (many) loops. Irreducibility will follow from Lemma SI For the second 
claim, notice that the time needed for the test "(/)(S, /t, £t) is in An'' can be considered as con- 
stant, since for the families Mn and f„ we only have to compare Q^Si while for the family W„ 
we only have to compare Si with the height of the wall at i. For the case of the culminating 
paths, see below in Section [53] □ 

We now choose the distribution (pi). Instead of Pi — we will use the probability 

distribution defined by 

Pi :— i(2n — i)Ko + a ( for i — 1, . . . , n), (1) 

where 



Kq 
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a = 1 /An 

We let the reader check that {pi)i<n is indeed a probability distribution. The reason for which 
we use this particular distribution will appear in the proof of Proposition [5] We will then need 
the following relation: for each 1 < i < n — 1, 

Pi - Pi-l/2 - Pi+i/2 = Kq. (2) 

For Algorithm [T] to be efficient, we need to know how S{T) is close in law to tt. This question 
is related to the spectral properties of the matrix P. In particular, the speed of convergence is 
governed by the spectral gap {i.e. 1 — A, where A is the largest of the modulus of the eigenvalues 
different from one, see |10j for example), but this quantity is not known in general. Some 
geometrical methods [5] allow to bound from below 1 — A, but they assume a precise knowledge 
of the structure of the graph defined by the chain P. It seems that such results do not apply 
here. 

Instead, we will study the metric properties of the chain P with respect to a natural distance 
on An , and show that it satisfies a certain contraction property. 

2.1 The variant of Algorithm [1] for culminating paths 

Unchanged, our Markov chain P cannot generate culminating paths since the path S = (a, a, . . . , a) 
would then be isolated: it has no peak/valley and 0(S, n, J,)— = (a, a, ... , —b) which is not cul- 
minating. 
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Thus we propose a slight modification for the family C„. We only change the definition of 
<^(S, i, e)S when i = n (it won't depend on S). Since the maximum is reached at n, the \b/a \ + 1 
last steps are necessarily 

(a, a, ... , a) or (— &, a, . . . , a). 

We thus define (j){S,n,"[)5 as the path obtained by changing the \b/a] + 1 last steps into 
(a, a,..., a) (regardless of their initial values in S) and 4>{S,n, as the path obtained by 
changing the \b/a \ + 1 last steps into (—6, a, . . . , a). 

Notice that despite this change the execution time of each loop of Algorithm [1] is still a 0(1): 

• If It < n, the time needed for the test "(j){S,It,et)St is in An" can be considered as 
constant, since we only have to compare 0, Si^ S„. 

• li It — n, the new value S'„ is compared with the maximum of S, which can be done in 
0{n). Fortunately, this occurs with probability p„ = 0{l/n), so that the time-complexity 
of each loop is, on average, a 0(1). 

3 Error estimates with contraction 

Going back to a more general setting, we consider a Markov chain in a finite set V, endowed 
with a metric d. For a vertice x € V and a transition matrix P, we denote by PS^ (resp. P*Sx) 
the law of the Markov chain associated with P at time 1 (resp. t), when starting from x. For 
x,y € V, the main assumption made on P is that there is a coupling between PSx, PSy (that is, 

a random variable {Xi, Yi) with Xi PSx, Yi P5y) such that 

nd{Xi,Y^)]<{l~K)d{x,y), (3) 

for some k > 0, which is called the Ricci curvature of the chain, by analogy with the Ricci 
curvature in differential geometrj0. If the inequality holds, then it implies ([T0],p.l89) that P 
admits a unique stationary measure tt and that, for any x, 

II /''^-vr ||tv< (l-'*)*diam(F), (4) 

where diam(V^) is the diameter of the graph with vertices V induced by the Markov chain. 
The notation || . ||tv stands, as usual, for the Total Variation distance over the probability 
distributions on V defined by 

II Ml - ||tv:= sup \^il{A) ~ ^l2{A)\ . 

AdV 

Hence, a positive Ricci curvature gives the exponential convergence to the stationary measure, 
with an exact (i.e. Q is non-asymptotic in t) bound. In many situations, a smart choice for 
the coupling between Xi,X2 gives a sharp rate of convergence in eq. dH) (see some striking 
examples in |12)). 

3.1 Metric properties of P 

To apply the Ricci curvature machinery, we endow each An with the L^-distance 

di(5,5') = ^Ei^.-^.'i- 

^The Ricci curvature is actually the largest positive number such that ((3)l holds, for all the couplings of 
PSx, PSy ; here we should rather say that Ricci curvature is larger than k. 
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(Notice that \Si — Sl\ is always a multiple of a + b.) For our purpose, it is fundamental that this 
metric space is geodesic. 

Definition 3. A Markov chain P in a finite set V is said to be geodesic with respect to the 
distance d on V if for any two x,y ^ V with d{x,y) = k, there exist k + 1 vertices Xq — 
x,Xi, . . . ,Xk — y in V such that for each i 

• d{xi,Xi+i) = 1 ; 

• Xi and Xi^i are neighbours in the Markov chain P (i.e. P{xi, Xi+i) > 0). 

This implies in particular that P is irreducible and that the diameter of P is smaller than 
max^ ^yd{x,y). 

Lemma 4. For each family Cn,V\^n,£n,Mn the Markov chain of Algorithmic is geodesic with 
respect to di. 

Proof of Lemma\^ The proof goes by induction on k. We fix 5 ^ T (and denote by s,t the 
corresponding words) ; we want to decrease di{S,T) by one, by applying the operator with 
proper i, e. We denote by zq G {1, . . . , n} the first index for which S ^ T. For instance we have 
Tig = Si„ + a + b. Let j be the position of the left-most peak in T in {ip + 1, jg + 2, . . . , n}, 
if such a peak exists. Then S' :— (j){T,j,l.)S is also in An- it is immediate for the families 
Mn,Wn,Cn,£n- We havc di{S,S') = k-l. 

If there is no peak in T after zq, then (ij^+i, ■ • ■ , in) = (a, a, ... , a). Hence we try to 
increase the final steps of S by one. To do so, we choose S' :— (/)(S,ri,t)^ ii S 4'{S,n,^)S, or 
S' = 4>{S,j, t)5 where j is the position of the right-most —b otherwise (we choose the right-most 
one to ensure that 4>{S,j,^)S remains culminating in the case where An = C„.). 

□ 

For meanders, excursions and walls, we will show that the Ricci curvature of P with respect 
to the distance di is (at least) of order 

Proposition 5. For the three families Mn, W„, the Ricci curvature of the associated Markov 
chain, with weights (pi) defined as in ([T]), is larger than Kq. 



i — 1 i i + 1 



Proof of Proposition O 




Fix S,T in An G {Mn,Sn,yVn}, we first assume that S,T are neighbours, for instance 
T — <f>(S, i, t) for some i. Let (S^, S^) be the random variable in An x An whose law is defined 

by 

(Si,S2) ^^^^ {q^{S,I,E),cf>{T,I,E)), 

where I is a r.v. taking values in {1, . . . , n} with distribution p and E is uniform in {t,4^}. Li 
other words, we run one loop of Algorithm [T] simultaneously on both paths. 

We want to show that S^,S^ are, on average, closer than S,T. Different cases may occur, 
depending on I and on the index i where S, T differ. 
Case 1. i — 1,2, ... ,n — 2. 
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Case la. X = i. This occurs with probabihty pi and, no matter the value of we have 
Si = S2. 

Case lb. I — i — 1 orz + 1. We consider the case i — 1. Since S and T coincide everywhere 
but in i, we necessarily have one of these two cases: 

• there is a peak in S at z — 1 and neither a peak nor a valley in T at i — 1 (as in the figure 
on the right) ; 

• there is a valley in T at i — 1 and neither a peak nor a valley in S at i — 1. 

In the first case for instance, then we may have (ii(Si,S^) = 2 \i E =|, while the distance 
remains unchanged if E The case I = i + 1 is identical. This shows that with a probability 
smaller than pi_i/2 +pi+i/2 we have di(Si,S^) = 2. 

Case Ic. I^z — + l and X ^ n. In this case, S and T are possibly modified in I, but 
if there is a modification it occurs in both paths. It is immediate since for the families A^n,W'„ 
and En since the constraints are local. 

Case 2. i — n — 1. In this case, it is easy to check that, because of our definition of (/)(S, n, e)d, 
we have 

E [di(S\s2)] < l-pn-i +p„-2/2+p„/2 = l-Ko- 
Case 3. i — n. We have 

E [di {S\S^)] < 1 + p„-i/2 - p„/2 = l-Ko. 

Thus, we have proven that when S, T only differ at i 

E[rfi(Si,S2)] <2x (p,_i/2 + p,+i/2) + 0xp,; + lx (l-p,;-p,_i/2-ft+i/2) (5) 
< (l~Ko) X 1 = (1 ^ Ko)di{S,T). 

What makes Ricci curvature very useful is that if this inequality holds for pairs of neighbours 
then it holds for any pair, as noticed in 3,. Indeed, take fc + 1 paths So = S, Si, Sk = T as 
in Lemma |4] and apply the triangular inequality for di : 

k-l 

E [di {cf,{S, I, E) , 0(T, I,E))]<Y,E [di (0(5, , X, i;), 0(5,+i , I, E))] 

<(l-Ko)fc = (l-«o)di(^,r). 

□ 

Remark 6. It is easy to exhibit some S, T such that ineq. ([5]) is in fact an equality. In the case 
where pi — l/n, this equality reads E [di(Si,S'^)] = di{S,T), and we cannot obtain a positive 
Ricci curvature (though this does not prove that there is not another coupling or another distance 
for which we could get a k > in the case pi — l/n.). 

We recall that for each family An, diam(^„) — max(ii(S, T) < n(n+l)/2. Hence, combining 
Proposition [5}vith Eq. (j4]) gives our main result: 

Theorem 7. For meanders, excursions and path with walls, Algorithm [7] returns an almost 
uniform sample ofir, as soon as T ^ . Precisely, for any itinialization of Algorithm[ll 

II S(T) - n ||tv< diam(X)(l - < ^^^^^ exp ■ 
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Another formulation of this resuh is that the mixing time of the associated Markov chain, 
defined as usual by 

W jinf t > ; sup || P*<5, - tt ||tv< 1 (6) 

{e~^ is here by convention), is smaller than n^(n+ 1) logn. For culminating paths, the argument 
of Case Ic fails and ([5]) does not hold, we are not able to prove such a result as Theorem [71 
However, it seems empirically that the mixing time is also of order n^logn (with a constant 
strongly dependent on a,b). A way to prove this could be the following observation: take 
(S°,T°) — (S,T) two any culminating paths, and define 

where It,£t, are those in Algorithm[TJ The sequence (|| S* — T* ||oo)( is decreasing throughout 
the process. Unfortunately we cannot get a satisfactory bound for the time needed for this 
quantity to decrease by one. 

3.2 Related works 

Bounding mixing times via a contraction property over the transportation metric is quite a 
standard technique, the main ideas dating back to Dobrushin (1950's). A modern introduction 
is made in [lOj . For geodesic spaces, this technique has been developped in [3j under the name 
path coupling. 

As mentioned in the introduction, the Markov chain P on lattice paths with uniform weights 
Pi = 1/n has in fact already been introduced for paths starting and ending at zero (sometimes 
called bridges) in [1 IJ , and its mixing time has been estimated in |14) . Wilson also proves a 
mixing time of order n'^logn, by showing that ([3]) holds with a different distance (namely, a 
kind of Fourier transform of the heights of the paths jl. This is the concavity of this Fourier 
transform which gives a good mixing time, exactly as the concavity of our p^'s speeds up the 
convergence of our chain. 

Wilson's method is developped only for bridges in [M] and it is not completely straightfor- 
ward to use it when the endpoints are not fixed. For instance, take n = 7 and a — b — 1, and 

consider the paths + + H h + and h H . There are more "bad moves" (moves 

that take away these paths) than "good moves". 

4 Coupling From The Past with P 

Propp- Wilson's Coupling From The Past (CFTP) [1^ is a very general procedure for the ex- 
act sampling of the stationary distribution of a Markov chain. It is efficient if the chain is 
monotonous with respect to a certain order relation ^ on the set V of vertices, with two ex- 
tremal points denoted 0, 1 (i.e. such that ^ a; ^ 1 for any vertex x). This is the case here for 
each family C„,W„,f„,A4„ , with the partial order 

S ^ T iff S"; < T, for any i. 

For the family A4io with a = l,b — —2 for instance, we have 

= ^meanders ^ (1' 1' 1, 1, -2, 1, 1, -2, 1), 

1 = ^meanders ^ ^' ^' 

^Notice that a, b do not have the same meaning in Wilson's paper: a (resp. b) stands for the number of 
positive (resp. negative) steps. 
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We describe CFTP, with our notations, in Algorithm [2l 



Algorithm 2 CFTP: Exact sampling of a path in An 
S ^ 6, T ^ i 

. . . , 1-2, 1-i ^ i.i.d. r.v. with law p 

. . . , e_2, £-1 ^ i-i-d. uniform r.v. in {t, i} 

. . . , S-2, S-i <~ i.i.d. uniform r.v. in {+, — } 

T = 1 

repeat 

s ^ 6, T ^ i 

for t ~ — T to do 

if (p{S,It ,et) is in An then S ^ (j){S, It, £t)St 
if 0(T,/t,et) is in A then T ^ (f>{T, h, £t)St 

end for 

T ^ 2t 
until S = T 



We refer to ('8 ,Chap.lO) for a very clear introduction to CFTP, and we only outline here 
the reasons why this indeed gives an exact sampling of the stationary distribution. 

• The output of the algorithm (if it ever ends!) is the state of the chain P that has been 
running "since time — cxd", and thus has reached stationnarity. 

• The exit condition S = T ensures that it is not worth running the chain from T steps 
earlier, since the trajectory of any lattice path R 1 is "sandwiched" between those 
of 0, 1, and therefore ends at the same value. 



-T/2 



1 ^ 



output 



Figure 3: A sketchy representation of CFTP : trajectories starting from 0, 1 at time —T/2 don't 
meet before time zero, while those starting at time — T do. 



Proposition 8. Algorithm\^ends with probability 1 and returns an exact sample of the uniform 
distribution over An- For the families Wn,£n,.Mn, this takes on average 0{n^{logn)'^) time 
units. 

Let us mention that in the case where the mixing time is not rigorously known. Algorithm 
[2] (when it ends) outputs an exact uniform sample and therefore is of main practical interest 
compared to MCMC. 

Proof of Proposition [51 It is shown in |13j that Algorithm [2] returns an exact sampling in 
^(*mix ^) ^^^^ chain, where t^^^ is defined in ^ and H is the length of the longest 

chain of states between and 1. It is a consequence of the proof of Lemma |4] that H — 0{n^). 
We have seen that ij^ix ~ 0{n^ logn). (Recall that each test in Algorithm [2] takes, on average, 
0(1) time units.) □ 
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We recall that CFTP has a major drawback compared to MCMC. For the algorithm to be 
correct, we have to reuse the same random variables It,St,5t, so that space-complexity is in fact 
linear in n'^(logn)^. This may become an issue when n is large. 

5 Concluding remarks and simulations 

1. In Fig|31 we show simulations of the three kinds of paths, for a = 1, 6 = 2, n = 600. 
We observe that the final height of the culminating path is very low (about 30), it would be 
interesting to use our algorithm to investigate the behaviour of this height when n — ^ oo ; this 
question was left open in [2]. 




Figure 4: (Almost) uniform paths of length 600, with a — 1,6 = 2. From top to bottom: a 
culminating path, a meander, a path with wall (shown by an arch). 



2. One may wonder to what extent this work applies to other families An of paths. The main 
assumption is that the family of paths should be a geodesic space w.r.t. distance di. This is 
true for example if the following condition on An is fulfilled: 

(i?, T <=, An &nd R < S ^T) ^ S An. 

Notice however that this is quite a strong requirement, and it is not verified for culminating 
paths for instance. 

3. A motivation to sample random paths is to make and test guesses for some functionals of these 
paths, taken on average over An- Consider a function f : An ^ we want an approximate 
value of 7r(/) := card(^„)~-'^ /('^)' exact value is out of reach by calculation. We 
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estimate this quantity by 

'^(/):=^E/(SW), (7) 

t=i 

(recall that S{t) is the value of the chain at time t). For Algorithm [T] to be efBcient in practice, 
we have to bound 

P(k(/)-vr(/)| >r), (8) 

for any fixed r > 0, by a non- asymptotic (in T) quantity. This can be done with ([9j, Th.4-5), 
in which one can find concentration inequalities for ([8|). The sharpness of these inequalities 
depends on k and on the geometrical structure of An- 
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